The BioNLP-ST challenges on information extraction and knowledge acquisition in biology Speakers: Robert Bossy and Jin-Dong Kim
نویسنده
چکیده
We propose a machine learning approach for semantic recognition and normalization of clinical term descriptions. Clinical terms considered here are noisy descriptions in Spanish language written by health care professionals in our electronic health record system. These description terms contain clinical findings, family history, suspected disease, among other categories of concepts. Descriptions are usually very short texts presenting high lexical variability containing synonymy, acronyms, abbreviations and typographical errors. Mapping description terms to normalized descriptions requires medical expertise which makes it difficult to develop a rule-based knowledge engineering approach. In order to build a training dataset we use those descriptions that have been previously matched by terminologists to the hospital thesaurus database. We generate a set of feature vectors based on pairs of descriptions involving their individual and joint characteristics. We propose an unsupervised learning approach to discover term equivalence classes including synonyms, abbreviations, acronyms and frequent typographical errors. We evaluate different combinations of features to train MaxEnt and XGBoost models. Our system achieves an F1 score of 89% on the Hospital Italiano de Buenos Aires (HIBA) problem list.
منابع مشابه
Overview of BioNLP Shared Task 2013
The BioNLP Shared Task 2013 is the third edition of the BioNLP Shared Task series that is a community-wide effort to address fine-grained, structural information extraction from biomedical literature. The BioNLP Shared Task 2013 was held from January to April 2013. Six main tasks were proposed. 38 final submissions were received, from 22 teams. The results show advances in the state of the art ...
متن کاملOverview of BioNLP Shared Task 2011
The BioNLP Shared Task 2011, an information extraction task held over 6 months up to March 2011, met with community-wide participation, receiving 46 final submissions from 24 teams. Five main tasks and three supporting tasks were arranged, and their results show advances in the state of the art in fine-grained biomedical domain information extraction and demonstrate that extraction methods succ...
متن کاملBioNLP shared Task 2013 - An Overview of the Bacteria Biotope Task
This paper presents the Bacteria Biotope task of the BioNLP Shared Task 2013, which follows BioNLP-ST-11. The Bacteria Biotope task aims to extract the location of bacteria from scientific web pages and to characterize these locations with respect to the OntoBiotope ontology. Bacteria locations are crucial knowledge in biology for phenotype studies. The paper details the corpus specifications, ...
متن کاملNew Resources and Perspectives for Biomedical Event Extraction
Event extraction is a major focus of recent work in biomedical information extraction. Despite substantial advances, many challenges still remain for reliable automatic extraction of events from text. We introduce a new biomedical event extraction resource consisting of analyses automatically created by systems participating in the recent BioNLP Shared Task (ST) 2011. In providing for the first...
متن کاملEvent Extraction for Post-Translational Modifications
We consider the task of automatically extracting post-translational modification events from biomedical scientific publications. Building on the success of event extraction for phosphorylation events in the BioNLP’09 shared task, we extend the event annotation approach to four major new post-transitional modification event types. We present a new targeted corpus of 157 PubMed abstracts annotate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016